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ABSTRACT 

We describe and implement an exact, flexible, and computationally efficient algorithm for joint 
component separation and CMB power spectrum estimation, building on a Gibbs sampling framework. 
Two essential new features are 1) conditional sampling of foreground spectral parameters, and 2) joint 
sampling of all amplitude-type degrees of freedom (e.g., CMB, foreground pixel amplitudes, and global 
template amplitudes) given spectral parameters. Given a parametric model of the foreground signals, 
we estimate efficiently and accurately the exact joint foreground-CMB posterior distribution, and 
therefore all marginal distributions such as the CMB power spectrum or foreground spectral index 
posteriors. The main limitation of the current implementation is the requirement of identical beam 
responses at all frequencies, which restricts the analysis to the lowest resolution of a given experiment. 
We outline a future generalization to multi-resolution observations. To verify the method, we analyse 
simple models and compare the results to analytical predictions. We then analyze a realistic simulation 
with properties similar to the 3-yr WMAP data, downgraded to a common resolution of 3° FWHM. 
The results from the actual 3-yr WMAP temperature analysis are presented in a companion Letter. 
Subject headings: cosmic microwave background — cosmology: observations — methods: numerical 



1. INTRODUCTION 

Great advances have been made recently both in ex- 
perimental techniques for studying the cosmic microwave 
background (CMB) and in the measurements themselves. 
The angular power spectrum of temperature fluctuations 
has been charact erized over more than three decades 
in angular scale dHinshaw et al.l 120071 iKuo et all l2007t 
iReadhead et al J 12004 ), and even the E-mode polariza- 
tion spectrum has now been measured to some preci- 
sion (lAde et al.l 120071: iPaee et alJ l2007t iMontrov et "all 
[20M Isievers et alJ I2007Fl In the coming years, even 
greater improvements in sensitivity are expected, with 
the Planck nearing completion. 

As the sensitivity of CMB experiments improves, the 
requirements on the control and characterization of sys- 
tematic effects also increase. It is of critical importance 
to propagate properly the uncertainties caused by such 
effects through to the CMB power spectrum and cos- 
mological parameters, in order not to underestimate the 
final uncertainties, and thereby draw incorrect cosmolog- 
ical conclusions. 

A prime example of such systematic effects is non- 
cosmological foregrounds in the form of galactic and 
extra-galactic emission. With an amplitude rivaling that 
of the temperature signal over a significant fraction of 
the sky and completely dominating the polarization sig- 
nal over most of the sky, the diffuse signal from our own 
galaxy must be separated accurately from the CMB sig- 
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nal in order not to bias the cosmological conclusions. 
Further, the uncertainties in the separation process must 
be propagated through to the errors on the CMB power 
spectrum and cosmological parameters. 

These problems have been discussed extensively 
in the literature, and many different approaches to 
both power spectrum analysis and component sepa- 
ration have been proposed. Two popular classes of 
power spectrum esti mation methods are the pseudo- 
Qi es timators (e.g., IWright et al.l Il994t iHivon et al.l 
2002] iSzapudi et atl 120011) and" maximum- likelihood 
methods (e.g., iGorskil fl99l fl997t iBond et all floM) . 
For a revi e w and comparison of these methods, see 
lEfstathioul (|2004| ). Examples of component sep- 
aration methods are the Maximum Entropy Method 
(iBarreiro et al.ll2004tlBennett et al1 l2003U iHo bson et all 
119981: IStolvarov et alJ I2002L |2005h. the Inter n al Lin- 

(iBennett et all l2003bb 
Wiener fil- 



ear Combination method 
iTeemark etHI 120031: lEriksen et~al . ._ 
terin g (|Bouchet fc Gispertlll999t iTegmark fc Efstathio"ul 
I1996D. the In d epend ent Com po nent Analysis m ethod 
(iMaino et al l 12001 120031 iDonzelli et all 120061 : 
IStivoli et al.l I2006D. and direct likelihood estimation 
(iBrandt et all 11994k iGqrski et al.l 119961 : iBandav et all 
1996; Eri ksen et aljboOfl ). 

The final step in a modern cosmological analysis 
pipeline is typically to estimate a small set of parameters 
for some cosmological model, which in practice is done 
by mapping out the parameter posteriors (or likelihoods) 
using an MCMC code (e.g., CosmoMC; Lewis and Bri- 
dle 2002). To do so, one must establish an expression for 
the likelihood C(Ce) — P(d\Cg), where Cn is a theoreti- 
cal CMB power spectrum and d are the observed data. 
It is therefore essential that the methods used in the base 
analysis pipeline (e.g., map making, component separa- 
tion, power spectrum estimation) allow one to estimate 
this function both accurately and efficiently. 
A particularly appealing framework for this task is the 
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CM B Gibbs sampl e r, pio neered by Uewell et all ()2004f ) 
and IWandelt et all (|2004D . While a brute-force CMB 
likelihood evaluation code must invert a dense signal- 
plus-noise covariance matrix, C = S + N, at a computa- 
tional cost of C(7Vp ix ), 7Vpi X being the number of pixels in 
the data set, the Gibbs sampler only requires the signal 
and noise covariance matrices separately. Consequently, 
the algorithmic scaling is dramatically reduced, typically 

to either O(N^) or 0(N* iK ) for data with white or cor- 
related noise, respectively. 

In addition to being a highly efficient CMB likeli- 
hood evaluator in its own right, as de monstrated by 
several previous analyses of real data (|Q'Dwver et al.l 
12004 lEriksen et al.|[2007aUbh . the Gibbs sampler also of- 
fers unique capabilities for propagating systematic uncer- 
tainties end-to-end. Any effect for which there is a well- 
defined sampling algorithm, either jointly with or con- 
ditionally on other quantities, can be propagated seam- 
lessly through to the final posteriors. One example of this 
is beam uncertainties. Given some stochastic description 
of the beam, for instance a mean harmonic space profile 
and an associated covariance matrix, one could sample at 
each step in the Markov chain one particular realization 
from this model and use the resulting beam for the next 
CMB sampling steps, allowing for a short burn-in period. 
The CMB uncertainties will then increase appropriately. 
Similar approaches could be taken for uncertainties in 
gain calibration and noise estimation. 

However, rather than simply propagating a particular 
error term through the system, one often wants to es- 
timate the characteristics of the effect directly from the 
data. In that case, a parametric model P(6\d), 9 be- 
ing a set of parameters describing the effect, must be 
postulated. Then, if it is both statistically and compu- 
tationally feasible to sample from this distribution, the 
effect may be included in the joint analysis, and all joint 
posteriors will respond appropriately. 

In this paper, we describe how non-cosmological 
frequency-dependent foreground signals may be included 
in a Gibbs sampler. In this framework the CMB signal is 
assumed Gaussian and isotropic, while the foregrounds 
are modeled either in terms of fixed spatial templates 
(e.g., monopoles, dipoles, low/high- frequency observa- 
tions) or in terms of a free amplitude and spectral re- 
sponse function at each pixel. Our current code assumes 
identical angular resolution for all frequency bands, but 
we outline in £0 how the algorithm may be generalized 
to handle multi-resolution experiments. 

Already with the present algorithm, we are able to 
perform a complete Bayesian joint CMB and foreground 
analysis of current CMB experiments on large angular 
scales. For example, in the present paper we demonstrate 
the algorithm on a realistic simulation corresponding to 
the 3-yr WMAP data. At an angular resolution of 3° 
FWHM, we are able to produce the exact likelihood up to 
£ ~ 50-60, into the regime where a cruder likelihood de - 
scription is likely to be acceptable ( Eriksen et al.ll2 007a'). 
Further, in a companion paper (jEriksen et al.ll2007cfl we 
analyze the real 3-yr WMAP data with the same tool, 
providing for the first time a complete set of physically 
motivated foreground posterior distributions of the ob- 
served microwave sky, together with their impact on cos- 
mological parameters. 



2. REVIEW OF BASIC ALGORITHMS 

The algorithm developed in this paper is a essen- 
tially a hybrid of two previous alg orithms, namely th e 
CMB Gibbs sampler dev eloped bv Uewell et alJ (|2004 h 
IWandelt et "all (|2004[) andlEriksen et alJ (l2004bl)7 and the 
foregr ound MCMC sampler developed bv lEriksen et al.l 
(2006). In this section, we review these algorithms, em- 
phasizing an intuitive and pedagogical introduction to 
the underlying ideas. In the next section we present the 
extensions required to make the hybrid code functional. 

Note that while we discuss temperature measure- 
ments only in this paper, the methodology for analyz- 
ing polarization measurements is com plet ely analogous. 
For exa mple, see lLarson et alJ (|2007l ) and lEriksen et all 
(|2007ph for details on polarized power spectrum analysis 
through Gibbs sampling. 

2.1. The CMB Gibbs sampler 

We first review the Gibbs sampler for CMB tempera- 
ture measurements. 

2.1.1. The CMB posterior 

We choose our first data model to read 

d = s + n, (1) 

where d are the observed data, s is the CMB sky sig- 
nal, and n is instrumental noise. Complications such as 
multi-frequency observations and beam convolution will 
be introduced at a later stage. 

We assume both the CMB signal and noise to be Gaus- 
sian random fields with vanishing mean and covariance 
matrices S and N, respectively. In harmonic space, 
where s = J2e m a imXimi the CMB covariance matrix 

is given by Ce m ,t> m > = {a\ m ai> m ') = C^Su'S-mm', C e be- 
ing the angular power spectrum. The noise matrix N is 
left unspecified for now, but we note that for white noise 
it is diagonal in pixel space, Nij = crfSij, for pixels i and 
j and noise variance of. 

Our goal is to estimate both the sky signal s and the 
power spectrum Ce, which in a Bayesian analysis means 
to compute the posterior distribution P(s,Cg\d). By 
Bayes' theorem, this distribution may be written as 

P(a,Ct\d) ocP(d|s,Q)P(s,Q) (2) 
ocP(d|s,Q)P(s|Q)P(Q), (3) 

where P{Cg) is a prior on Ci, which we take to be uni- 
form in the following. Our final power spectrum distri- 
bution may thus be interpreted as the likelihood, and 
integrated directly into existing cosmological parameter 
MCMC codes. Since we have assumed Gaussianity, the 
joint posterior distribution may thus be written as 

21+1 

P( S , C,\d) OC e -i(d-s)*N-(d- S ) -Q £l_fl m)i 

I C^ 

(4) 

where a t = E m =-i l a ^m| 2 is the angular power 

spectrum of the full-sky CMB signal. 

2.1.2. Gibbs sampling 

In principle, we could map out this distribution over a 
grid in s and Cg, and the task would be done. Unfortu- 
nately, since the number of grid points required for such 
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an analysis scales exponentially with the number of free 
parameters, this approach is not feasible. 

A potentially much more efficient approach is to map 
out the distribution by sampling. However, direct sam- 
pling from the joint distribution in Equation [4] is diffi- 
cult even from an algorithmic point of view alone; we 
are not aware of any textbook approach for this. And 
even if there were, it would most likely involve inverses 
of the joint S + N covariance matrix, with a prohibitive 
0(Np ix ) scaling, in order to transform to the eigenspace 
of the system. 

This is the s i tuatio n in which iJewell et al.l (|2004f ) and 
IWandelt et al.l (|2004h proposed a particular Gibbs sam- 
pling scheme. For a general intr o ducti on to the algo- 
rithm, see, e.g., iGelfand fc Smlthl (fl990l ). In short, the 
theory of Gibbs sampling tells us that if we want to sam- 
ple from the joint density P(s, Ce |d), we can alternately 
sample from the respective conditional densities as fol- 
lows, 



G, 



i+X 



P(s|C?,d) 
P(Q|s i+ \d) 



(5) 
(6) 



Here <— indicates sampling from the distribution on the 
right-hand side. After some burn-in period, during which 
the samples must be discarded, the joint samples (s 1 , C\) 
will be drawn from the desired density. Thus, the prob- 
lem is reduced to that of sampling from the two condi- 
tional densities P(s\Ce,d) and P(Cg\s, d). 

2.1.3. Sampling algorithms for conditional distributions 

We now describe the sampling algorithms for each 
of these two conditional distributions, starting with 
P(Ci\s,d). First, note that P(C e \s,d) = P(C e \s); if we 
already know the CMB sky signal, the data themselves 
tell us nothing new about the CMB power spectrum. 
Next, since the sky is assumed Gaussian and isotropic, 
the distribution reads 



P(C e 



l B t s - 



2£+l fj_ 
2 C f 



s cx 



(7) 



c: 



which, when interpreted as a function of C'i, is known 
as the inverse Gamma distribution. Fortunately, there 
exists a simple textbook sampling alg orithm for this dis- 
tribution (e.g., Eriks en et alj l200 4b) , and we refer the 
interested reader to the previous papers for details. 

The sky signal algorithm is even simpler from a statis- 
tical point of view, although more involved to implement. 
Defining the so-called mean-field map (or Wiener filtered 
data) tobes = (S _1 +N _1 ) _1 N _1 d, the conditional sky 
signal distribution may be written as 

P(s|Q,d)ocP(d| S ,Q)P(s|Q) (8) 

x e -i(d-s) t N- 1 (d-s) g-is^-'s 



jls-sJ'fS-'+N-'Jls-s) 



(9) 

^g-S^-^V" A---V. (10) 

Thus, P(s|C^,d) is a Gaussian distribution with mean 
equals to s and a covariance matrix equals to (S -1 + 
N- 1 )- 1 . 

Sampling from this Gaussian distribution is straight- 
forward, but computationally somewhat cumbersome. 
First, draw two random white noise maps loq and lo\ with 
zero mean and unit variance. Then solve the equation 

[S" 1 + INT 1 ] s = N _1 d + S"*u; + N - ^wi. (11) 



for s. Since the white noise maps have zero mean, one 
immediately sees that (s) = s, while a few more calcula- 
tions show that (ss*) = (S" 1 + N -1 ) -1 . 

The problematic part about this sampling step is the 
solution of the linear system in Equation II II Since this 
a ~ 10 6 x 10 6 system for current CMB data sets, it can- 
not be solved by brute force. Instead, one must use a 
method called Conjugate Gradients (CG), which only 
requires multiplication of the coefficient matrix on the 
left-hand side, not inversion. For details on these com- 
put ations, togeth e r with some ideas on preconditioning, 
see lEriksen et afl (|2004bf ). 

2.1.4. Generalization to multi-frequency data 

For notational transparency, the discussion in the pre- 
vious sections was limited to analysis of a single sky map, 
and did not include the effect of an instrumental beam. 
We now review the full e quations for the general case. 
See lEriksen et al.l ()2004bl ) for full details. 

Let d v denote an observed sky map at frequency ^, 
N„ its noise covariance matrix, and A„ convolution with 
the appropriate instrumental beam response. Equation 
[TT1 then generalizes to 



s = 



(12) 



Note that we now draw one white noise map for each 
frequency band, w v . The sampling procedure for P(Cg\s) 
is unchanged. 

2.1.5. Computational considerations 

Finally, we make two comments regarding numerical 
stability and computational expense. First, note that the 
elements of s have a variance equal to the CMB power 
spectrum, which goes as Cg <~ £~ 2 . To avoid round-off 
errors over the large dynamic range in the solution, it is 
numerically advantageous to solve first for x = S _ 2S in 
the CG search, and then to solve (trivially) for s. The 
system solved by CG in practice is thus 



l + S'^AtN^A./S* 



J2 AtN- x d„ + Lu Q +S L * A - N - 



(13) 



Second, solving this equation by CG involves multipli- 
cation with expression in the brackets on the left-hand 
side, and therefore scales as the most expensive operation 
in the coefficient matrix. For white noise, N^- = aiSij, 
this is the spherical harmonic transform required between 
pixel (for noise covariance matrix multiplication) and 
harmonic (for beam convolution and signal covariance 

matrix multiplication) space, with a scaling of O(N^). 
For correlated noise, it is the multiplication with a dense 
x i 

ingofO(iV p 2 ix ). 



Api x x iVpi x inverse noise covariance matrix, with a scal- 



2.2. The foreground sampler 
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The previous section described how to sample from 
the exact CMB posterior P(s,Ce\d) by Gibbs sampling. 
In this section, we very briefly review the algorithm for 
sampl ing general sky signals presented bv lEriksen et all 

First we define a parametric frequency model for the 
total sky signal, S v {0), 9 representing the set of all free 
parameters in the model. A simple example would be 

S{T cmh ,A s ,(3 s ) = T cmb + A s a(v) i-^j , where T cmb is 

the CMB temperature, A s is the synchrotron emission 
amplitude relative to a reference frequency v Q , (3 S is the 
synchrotron spectral index, and a(v) is the conversion 
factor between antenna and thermodynamic temperature 
for differential measurements. Note that no constraints 
are imposed on the form of the spectral model in general, 
beyond the fact that it should contain at most N v — 1 free 
parameters, N v being the number of frequency bands of 
the experiment. In practice, one should also avoid models 
that contain nearly degenerate parameters. 

Our goal is now to compute the posterior distribution 
P(9\d) for each pixel. For this to be computationally fea- 
sible, we make two assumptions. First, we assume that 
the noise is uncorrelated between pixels, and second, that 
the instrumental beams are identical between frequency 
bands. If so, the data may be analyzed pixel-by-pixel, 
and the likelihood for a single pixel simply reads 

- 21n£(0) = x 2 (8) = £ ( ^~l M ) 2 - ( 14 ) 

The posterior is as usual given by P(9\d) oc C{9)P{9), 
where P(9) is a prior on 9. Given this likelihood and 
prior, it is straightforward to sample fro m P(9\d), for 
insta nce by Metropolis-Hastings MCMC (Eriks en et al.l 
2006) , or by inversion sampling as described later in this 
paper. 

3. JOINT CMB AND FOREGROUND SAMPLING 

The main goal of this paper is to merge the two algo- 
rithms described in ^2.11 and 12.21 into one joint CMB- 
foreground sampler, allowing us to estimate the joint pos- 
terior P(s, Cg, 9\d). In this paper we focus on a matched 
beam response experiment, for which A. v = A, which 
is sufficient for low-resolution analysis of high-resolution 
experiments such as WMAP and Planck. 

3.1. Data model and priors 
We define the joint data model to be 

M N 

d u = As + ^2 a u ,iti + b ]fj(")fj + 

(15) 

+ ^c t g t (v;9 t ) + n, y . 

fc=i 

The first term on the right-hand side is the CMB sky sig- 
nal. The second term is a sum over M spatial templates, 
ti, each having a free amplitude a v> i at each frequency, 
for example monopole and dipole components. The third 
term is a sum over N spatial templates, fj with a fixed 
frequency scaling fj{v) and a single o verall amplitude 
6,-, fo r example the H a template (e.g., Dickinso n et al.l 
2003) coupled to a power law spectrum with free-free 



spectral index of /3g = —2.15. Such spatial templates are 
a way of incorporating constraints on the sky from other 
measurements. Their value depends on the validity of 
assumptions about spectra over large frequency ranges 
(e.g., H Q as a proxy for free-free emission at CMB fre- 
quencies). Nevertheless, CMB experiments with too few 
frequencies to constrain foregrounds adequately on their 
own require such templates to provide additional con- 
straints. Both ti and fj are assumed to be convolved to 
the appropriate angular resolution of the experiment. 

The fourth term, the most important novel feature of 
this paper, is a sum over K foreground components each 
given by an overall amplitude Ck (p) and a frequency spec- 
trum gk(y \ 9k{p)) at each pixel p. The spectral parame- 
ters 9k{p) may or may not be allowed to vary from pixel 
to pixel. By allowing independent frequency spectra at 
every single pixel, the model is very general, and capable 
of describing virtually any conceivable sky signal. 

The fifth and last term, n„, is instrumental noise. 

In the current implementation of our codes, we allow 
only foreground spectra parametrized by a single spectral 
index, g(y\ (3) — Q(v)(y / Vo)^ , where Q(v) is an arbitrary, 
but fixed, function of frequency and v$ is a reference 
frequency. A typical example is synchrotron emission, 
which may be modelled accurately over a wide frequency 
range by a simple power law (in intensity, flux density, 
or antenna temperature units) with a spatially varying 
spectral index. The CMB is most naturally described in 
terms of thermodynamic units, and we adopt this con- 
vention in our codes. The corresponding synchrotron 
model is therefore g(v; (3) = a(v)(y / 'vo) 13 , where a(i>) is 
the antenna-to-thermodynamic conversion factor. 

As in any Bayesian analysis, we must adopt a set of 
priors for the parameters under consideration. For this 
paper, we choose the prior most widely accepted in the 
stati stical community, namely Jeffreys' ignorance prior 
fe.g.. lBox fc fiaol[l992T) . This prior is given by the square 
root of the Fisher information measure, Pg oc \/\F\o9 = 

^/\ — d 2 ln£/d 2 9\. Its effect is essentially to "normal- 
ize" the parameter volume relative to the likelihood, and 
make the likelihood so-called "data translated" . We will 
return to the effect of this prior in §4.21 We impose an 
additional multiplicative prior on spectral parameters, 
either a top-hat or a Gaussian. 

For amplitude-type degrees of freedom, the ignorance 
prior works out be the usual flat prior, but for non-linear 
parameters, e.g., spectral indices, it is non- uniform. In 
particular, for the power law spectrum parametrized by 
a spectral index (3 described above, it reads P(J3) oc 

VEMGW^WJ^¥HW^- The difference be- 
tween this and a flat prior will be demonstrated in i j4.2l 
An important special case is the CMB power spectrum, 
for which we adopt a uniform prior despite the fact that 
the corresponding density is non-Gaussian. The main 
reason for doing so is that most cosmological parameter 
estimation codes expect the CMB likelihood, rather than 
the CMB posterior. 

3.2. Sampling from the joint posterior distribution 

Having defined our data model and priors, the goal 
is now to estimate the joint CMB-foreground posterior 
P(s, Ct, a u ,i, bj, Ck, 9k\d). This is achieved through the 
following straightforward generalization of the previous 
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Gibbs sampling scheme, 
{s,a„ <i ,b j ,c k } i+1 <- P(s,a„,i,bj,c k \Ci,ei,d) 



(16) 

91+ 1 <- P(^|s' i+1 ,<+ 1 ,6; +1 ,4+ 1 ,d) (17) 



c; 



•t+i 



P(Q|s 4+i ) 



(18) 



Explicitly, all amplitude-type degrees of freedom are 
sampled jointly with the CMB sky signal using a gen- 
eralization of Equation 111! while all non-linear spectral 
parameters are sampled conditionally by inversion sam- 
pling, as described in £)3.2.2I The conditional CMB power 
spectrum sampling algorithm is unchanged, since it only 
depends on the CMB sky signal. 

3.2.1. Amplitude sampling 

We first describe the algorithm for sampling from the 
conditional amplitude density, P(s, a v ,i, bj, c k \Ci, 9k, d) 

Conditional sampling of amplitudes — In principle, we 
could take further advantage of the Gibbs sampling ap- 
proach, and sample each of s, a v ^, bj and c,t condition- 
ally, given all other parameters, including the amplitudes 
not currently being sampled . This method was briefly 
described by lEriksen et alJ (|2004tJ ) for monopole and 
dipol e sampling, and later used for actual an a lysis b y 
both lO'Dwver etai] (|2004D and lEriksen et ail (|2007bD . 
Briefly stated, this approach simply amounts to subtract- 
ing each of the signals that is conditioned upon from the 
data, and using the residual map to sample the remain- 
ing parameters in place of the full data set. Its main 
advantage is highly modularized, simple and transparent 
computer code. 

However, for general applications this is a prohibitively 
inefficient sampling algorithm due to poor mixing prop- 
erties and long Markov chain correlation lengths. The 
problem is due to strong correlations between the vari- 
ous amplitudes. Consider for instance a model including 
a CMB sky signal, monopole and dipole components, and 
a foreground template. Note that the latter has both 
a non-zero monopole and dipole and also smaller-scale 
structure. 

The conditional sampling algorithm would then go as 
follows: First subtract the current monopole, dipole and 
foreground components from the data, and sample the 
CMB sky based on the residual map. The uncertainties 
in this conditional distribution are both cosmic variance 
and instrumental noise. Second, subtract the recently 
sampled CMB signal and foreground template from the 
data, and sample the monopole and dipoles of the resid- 
ual. The only source of uncertainty in this conditional 
distribution is instrumental noise alone, and the next 
sample therefore equals the previous state plus a noise 
fluctuation. For high signal-to-noise data the instrumen- 
tal noise uncertainty in a single all-sky number such as 
the monopole and dipole amplitude is very small indeed, 
and the new sample is therefore essentially identical to 
the previous. Finally, subtract the CMB signal and the 
monopole and dipole from the data, and sample the fore- 
ground template amplitude of the new residual. Again, 
with high signal-to-noise data the new amplitude is vir- 
tually identical to the previous. 

The failure of this approach stems from the fact that 
the main uncertainty in the monopole, dipole and tem- 
plate amplitudes is not instrumental noise, but rather 



CMB cosmic variance coupled from template structures. 
This component is not explicitly acknowledged in the 
conditional template sampling algorithms when condi- 
tioning on the CMB signal, but only implicitly through 
the Gibbs sampling chain. The net result is an extremely 
long Markov chain correlation length. 

The reasons t his conditional approac h worked well in 
the a nalyse s of lEriksen et all (l2004bD . lO'Dwver et all 
(|2004h and lEriksen et alJ (|2007bF cases were different, 
and somewhat fortuitous: Only the monopole and dipole 
components were included in the 1-yr WMAP temper- 
ature analysis, which couple only weakly to the low-£ 
CMB modes with a relatively small sky-cut. No fore- 
ground template sampling step as such was included, 
which would couple strongly to both the monopole, 
dipole and CMB signals. For the polarization analy- 
sis of Eri ksen et alJ |2007b), in which foreground tem- 
plates were indeed included, a different effect came into 
play, namely the very low signal-to-noise ratio of the 3-yr 
WMAP polarization data. At this signal-to-noise, even 
conditional sampling works well. 

Joint sampling of amplitudes — The solution to this prob- 
lem is to sample all amplitude-type degrees of freedom 
jointly from P(s, a v ^, bj, c k \Ct, 6 k , d). This is a four- 
component Gaussian distribution with mean x and co- 
variance matrix A. The required sampling algorithm is 
therefore fully analogous to that described in £ 32.1.31 for 
the CMB sky signal. The remaining task is to generalize 
the expressions for x and A. 

To keep the notation tractable, we first define a sym- 
bolic four-element block vector of all amplitude coeffi- 
cients, x = (s, a Uj i, bj, Cfe)*. The first block of x con- 
tains the harmonic coefficients of s, the second block 
contains a v> i for all frequencies and templates, the third 
contains bj for all templates with a fixed spectrum, 
and the fourth contains the pixel amplitudes c k for all 
pixel- by-pixel foreground components. In total, x is 
an ((£ max + l) 2 + M AT band + N + K A pix )-element vec- 
tor. We also define a corresponding response vector 
u„ = (1, tj, fj(v)ij, gfc(z/; Ok)Y, such that the data model 
in Equation [lj] may be abbreviated to d„ = x • + n u . 

With this notation, the joint amplitude distribution 
reads 

P(s, a Vji , bj,c k \Ci, 9 k , d) cx P(d|s, Ct,a v ,i, bj,c k )P(s\Cz) 



cx e 2 



5 £„(d„-x-u„)*N;7 1 (d„-x-u„) 



e 2 



A 



Here we have implicitly defined the symbolic 4x4 inverse 
covariance block matrix 

A*N _1 T A*N _1 F A*N -1 G ' 
T t N -i T T t N -i F t*N- x G 

F'N^T F*N -1 F F*N _1 G 
G*N- 1 T G'N-iF G'N^G 

(19) 

(see Appendix lAlfor explicit definitions of each element in 
this matrix) and a corresponding four-element symbolic 
block vector for the Wiener-filter mean, 



r s- 1 + a*n-!a 
T t N -l A 

F*N _1 A 
G*N- X A 



x = A 



E,A*N- 1 d 



(20) 
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Fig. 1. — Comparison of trace plots generated by the joint (solid 
curve) and the conditional (dashed curve) template amplitude sam- 
pling algorithms for a simulated data set consisting of a CMB sky 
signal, a monopole component and a synchrotron template compo- 
nent. The true input template amplitude is shown as a horizontal 
dotted line. 

The sampling algorithm for this joint distribution is 
now fully analogous to the one described in ij2.1.3l by 
Equation 1121 1) Draw -/Vband + 1 white noise maps with 
zero mean and unit variance; 2) form the Wiener filter 
mean plus random fluctuation right-hand side vector, 



E.A'N-M + Ciwo + E.AtN/ 



t l -N" 



E„ /»fjN-M + £„ J»fjN-^ 

E* ^N^d + e„ gkte 

and 3) solve the set of linear equations, 



X 



= b. 



(21) 



(22) 



The solution vector x then has the required mean x and 
covariance matrix A. Again, for numerical stability it 
is useful to multiply both sides of Equation [22] by the 
block-diagonal matrix P = diag(C~2 ; 1, \ } l) ; and solve 
for Px by CG. 

To demonstrate the difference in mixing efficiency be- 
tween conditional and joint amplitude sampling, Figure[T] 
shows two trace plots for a high signal-to-noise simula- 
tion that included a CMB, a monopole, and a foreground 
template component. While the joint sampler instanta- 
neously moves into the right regime, and subsequently 
efficiently explores the correct distribution, the condi- 
tional sampler converges only very slowly toward the cor- 
rect value. The associated long Markov chain correlation 
length makes this approach unfeasible for general prob- 
lems. 

Pre conditioning- — Th e performance of the CG algorithm 
(see lShewc huk (1994) for an outstanding introduction to 
this method) depends sensitively on the condition num- 
ber of the coefficient matrix A, i.e., the ratio of the largest 
to the smallest eigenvalue. In fact, the algorithm is not 
guaranteed to converge at all for poorly conditioned ma- 
trices, due to increasing round-off errors in cases that 
require many iterations. 

The condition number of the regularized A matrix is 
essentially the largest signal-to-noise ratio of any compo- 
nent in the system, which in practice means that of the 



CMB quadrupole or the template amplitudes. For cur- 
rent and future CMB experiments, such as WMAP and 
Planck, the integrated signal-to-noise of these large-scale 
modes is very large. It is therefore absolutely essential to 
construct an efficient preconditioner, M « A, to decou- 
ple these modes brute-force, M _1 .4x = M J b, simply 
in order to achieve basic convergence. 

For the 4x4 coupled system described above, we adopt 
a three-stage preconditioner. First, for the low-£ CMB 
components we explicitly c ompute all elements o f A up 
to some £ preC ond ~ 20-70 (|Eriksen et al.ll2004bD . This 
\ow-£ block is then coupled to the template amplitudes 
in a symbolic 3x3 preconditioner, 



M n 



1 + A'N- : A 

T * N -i A 

F'N _1 A 



A*N _1 T A'N _1 F 
T*N _1 T T*N- X F 
F'N _1 T F*N _1 F 



(23) 



The elements in this matrix are computed by trans- 
forming each object individually into spherical harmonic 
space, including modes only up to £ pre cond) and then per- 
forming the sums explicitly. (Note that the seemingly in- 
tuitive proposition of computing the template elements 
in pixel space, as opposed to in harmonic space, is flawed; 
unless all elements are properly bandwidth limited, a 
non-positive definite preconditioning matr ix will result.) 
For exa mples of such computations, see lEriksen et al.l 
(|2004bD . 

The second part of our preconditioner regularizes the 
high-£ CMB components, and consists of the diago- 
nal elements Ae m .£'m> 5u'd mm > from £ piecon d + 1 to £ max 
(jEriksen et all l2004b[ ). The third part of our precon- 
ditioner covers the single pixel-pixel foreground ampli- 
tudes, which have low signal-to-noise ratio, and are pre- 
conditioned with the corresponding diagonal elements of 
A only, G*N- 1 G. 

For a typical low-resolution WMAP3 application (five 
frequency channels degraded to N s id e = 64 and 3° 
FWHM resolution and regularized with 2/iK RMS white 
noise), we find that including only the diagonal elements 
in the above matrix can bring the fractional CG residual 
down to ~ 10~ 4 , while the recommended convergence cri- 
terion for single-precision data is 10 -6 . Thus, including 
the CMB-tcmplate cross-terms in the \ow-£ CMB pre- 
conditioner in Equation 1231 is not just a question of per- 
formance for the signal-to- noise levels of WMAP; it is 
required in order to converge at all. The total number of 
CG iterations is typically < 200 for the same application 
with the previously described three-level preconditioner. 
For some further pr omising idea s on pr econditioning for 
similar systems, see lSmith et al.l (|2007t ). 

Imposing linear constraints — A useful addition to the 
above formalism is the possibility of imposing linear con- 
straints on one or more of the parameters. For instance, 
if it is possible to calibrate the absolute offset of one 
frequency band by external information, for instance us- 
ing knowledge about the instrument itself, it would be 
highly beneficial to fix the corresponding monopole value 
accordingly. Another constraint may be to exclude tem- 
plate amplitude combinations with a given frequency 
spectrum, in order to disentangle arbitrary offsets at each 
frequency from the absolute zero-level of a given fore- 
ground component. 
In the present code, we have implemented an option for 
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imposing linear constraints on the template amplitudes 
a = {av,i\ on the form 

^^ = ^•3 = 0, (24) 

where q fe = {q„ A }, k — 1, . . . ,N C , are constant orthog- 
onality vectors, and N c is the number of simultaneous 
linear constraints. For example, if we want to obtain a 
solution with a fixed monopole amplitude at frequency 
pq, we would set q U j = 8 u ^ a 8. ifi . 

The total dimension of the template amplitude vector 
space is D = MiVbandi M being the number of free tem- 
plates at each band. Within this space, the constraint 
vectors q span an iV c -dimensional sub-space V to which 
the CG solution must be orthogonal; a must lie in the 
complement of V, denoted V c . 

To achieve this, we construct a projection operator P : 
R D — > V c by standard Gram-Schmidt orthogonalization, 
which is a D x (D— 7V c )-dimensional matrix P. To impose 
the constraints defined by Equation [M] on the the final 
CG solution, Equation [22] is rewritten as 

PM _1 PP*x = P*b, (25) 

which is solved as before. Corresponding elements in the 
preconditioner are similarly modified in order to main- 
tain computational efficiency. 

3.2.2. Spectral parameter sampling 

With the amplitude sampling equations for 
P(s, a u ,i, b 3 Ck\Ci, 9k, d) in hand, the only missing 
piece in the Gibbs sampling scheme defined by Equa- 
tions [T6HT8j is a spectral parameter sampler for 
P(9k\s, a v .i, bj , Ck, d ). In the FGFit code presented by 
lEriksen et al.1 (|2006f ). this task was done by Metropolis- 
Hastings MCMC, a very general technique that can 
sample from almost any multi-variate distribution. 
However, it has two disadvantages. First, hundreds of 
MCMC steps may be required to generate two uncor- 
rected samples, making the process quite expensive. 
Second, and even worse for our application, the chains 
may need to "burn in" at each main Gibbs iteration, 
because the amplitude parameters have changed since 
the last iteration. Proper monitoring of these issues is 
difficult for problems with tens of thousands of pixels 
with very different signal-to-noise ratios. 

Therefore, we have replaced the MCMC sampler with 
a direct sampler, specifically a standard inversion sam- 
pler, in the present version of our codes. While this al- 
gorithm is only applicable for univariate problems, it is 
also quite possibly the best such sampler, as it draws 
from the exact distribution, and no computation of ac- 
ceptance probabilities is needed. The algorithm is the fol- 
lowing: First, compute the conditional probability den- 
sity P(x\9), where x is the currently sampled parame- 
ter and 9 denotes the set of all other parameters in the 
model. In our application, P(x\9) is the normalized prod- 
uct of the likelihood £ in Equation [14] and any prior we 
wish to impose. Then compute the corresponding cumu- 
lative probability distribution, F(x\9) — /_ P(y\9)dy. 
Next, draw a random number u from the uniform distri- 
bution U[0, 1]. The desired sample from P(x\9) is given 
by F(x\9) = u. 

For multivariate problems we use a Gibbs sampling 
scheme to draw from the joint distribution, and sample 



each parameter conditionally. For example, if we want to 
allow free amplitudes (c s and Cd) and spectral indices (/3 S 
and (3d) for both synchrotron and thermal dust emission, 
the full sampling scheme reads 

{s,c s , Cs r +i «- p(s,c s ,c s |q,/?:,/?:,d) (26) 

C; +1 <- P(C e \s i+x ) (27) 
(3i +1 ^P(0 s \ S i+ \,4+\ci +1 ) ^d) (28) 



P7 - P(/3 d \s* + \,cl + \cl + \f3l,d) (29) 



Note that it can be beneficial to iterate the latter two 
equations more than once in each main Gibbs loop, in or- 
der to reduce the correlations between consequtive sam- 
ples cheaply. Typically, with two moderately correlated 
spectral indices we run ~ 3 spectral index iterations for 
each main Gibbs iteration. 

While this approach results in quite acceptable mix- 
ing properties for reasonably uncorrelated parameters 
(e.g., synchrotron and dust spectral indices), other and 
more efficient methods may be required for more compli- 
cated problems. Viable alternatives for such situations 
are, e.g., rejection sampling or even standard Metropolis- 
Hastings MCMC with proper burn-in monitoring. The 
details of the particular sampling algorithm are of little 
importance as long as it can be proved that the method 
produces samples from the correct conditional distribu- 
tion. 

4. M ARGIN ALIZ ATION , PRIORS AND DEGENERACIES 

The algorithm described in Sj3] provides samples from 
the full joint posterior P(s, Ci, a u> i, bj, Ck, #fe|d). From 
these multivariate samples we estimate each parameter 
individually by marginalizing over all other parameters 
in the system and reporting, say, the marginal posterior 
mean and standard deviation. 

This is straightforward, but there are subtleties and 
care is required. Before applying the method to simu- 
lated data in $5] and [6] therefore, we discuss marginal- 
ization, priors, degeneracies, and high-dimensional prob- 
ability distributions. 

Much of the following deals with the degeneracy be- 
tween unknown offsets (or monopoles) at each band and 
the overall zero-level of a foreground component with a 
free amplitude at each pixel. The same observations ap- 
ply to any full-sky template with a free amplitude at each 
band (e.g., the three dipoles). For simplicity we discuss 
only offsets below. For the same reason, we neglect the 
antenna-to-thermodynamic temperature conversion fac- 
tor. When explicit formulae are derived, the simplified 
and more readable versions are given in the text; full 
expressions are given in Appendices [B] and [C] 

It turns out that the degeneracy between unknown off- 
sets and the foreground zero-level has almost no effect 
on the CMB component. For the CMB, the relevant 
quantity is the sum over all foregrounds, not internal 
degeneracies among different foregrounds. If one cares 
only about separating the CMB from foregrounds, and 
not the foregrounds themselves, much of the following 
can be ignored. 

4.1. The offset-amplitude-spectral index degeneracy for 
a single pixel 

Consider a hypothetical experiment that observes a 
single pixel at 30, 44, 70, and 100 GHz, with RMS noise 




Amplitude (u.K) Amplitude (jiK) Offset (uK) 




Fig. 2. — The one - (bottom row) and two-dimensional (top row) marginal posteriors for the single-pixel and four frequency-band data 
set described in £14.11 The model includes a free offset for the lowest frequency, a foreground amplitude and a spectral index. The contours 
in the two-dimensional plots indicate where — 21nP has dropped by 0.1, 2.3, 6.17 and 11.8, respectively, corresponding to the peak and 1, 
2, and 3a region for a Gaussian distribution. The crosses mark the true values. In the one-dimensional plots, the dashed lines indicate the 
true values, and the dotted lines show the marginal posterior mean. 



10/iK in each band. Assume that the signal is a straight 
power-law parametrized by amplitude A and spectral in- 
dex P, and that the absolute offset of the detectors is 
known perfectly for the three highest frequencies, but 
not for the 30 GHz band. The signal model is 

T v = mS u , Vl +A(^\ . (30) 

There are three free parameters in this system, the offset, 
amplitude, and spectral index, and four measurements. 
Since the number of constraints exceeds the number of 
degrees of freedom, it should be possible to estimate all 
three parameters individually. 

We simulated one realization of this model, adopting 
the model parameters A = 100 /iK, (3 = —3 and m = 
/iK, and adding white noise to each band. Our priors 
are chosen to be uniform over —300 /iK < A,m < 300 /iK 
and —6 < f3 < 0. We compute the joint posterior by a 
simple x 2 evaluation over a 200 x 200 x 200 grid, and 
marginalize by direct integration. 

Figure [2] shows the results in terms of one- and two- 
dimensional marginal posteriors. The true input values 
are marked by thick crosses in the top panels and by 
dashed lines in the bottom panels. The posterior means 
are shown by dotted lines in the bottom panels. 

This simple example highlights two problems that will 
recur in later sections. First, as the top left panel 
shows, the offset and amplitude are highly degenerate 
and anti-correlated; one may add an arbitrary offset to 
the 30 GHz band and subtract it from the foreground 
amplitude, without affecting the final x 2 . This degen- 
eracy is a crucial issue for CMB component separation. 
Many foregrounds have power-law spectra, and differen- 



tial anisotropy experiments (e.g., WMAP) cannot deter- 
mine absolute offsets. The monopoles of the WMAP 
temperature sky maps were determined a posteriori 
based on a co-secant fit to a crude plane-parallel galaxy 
model (|Bennett et al.ll2003bt iHinshaw et al.ll2007l ). This 
approach is prone to severe modelling errors, precisely 
because of this type of degeneracy. 

The second problem is that integration over a 
highly degenerate joint posterior yields complicated and 
strongly non-Gaussian marginal posteriors. Obtaining 
unbiased point estimates from these posteriors is not 
trivial. Clearly, the posterior mean is not an unbiased 
estimator. Further, as we will see in the next section, 
even the posterior maximum is biased in general, unless 
special care is taken when choosing priors. 

4.2. Uniform vs. Jeffreys' prior 

The strong degeneracies found in the previous example 
can be broken partially by adding more data. Consider a 
full-sky data set pixclizcd at HEALPix resolution N p i X = 
16 (3072 independent pixels). Reduce the noise to 1 /iK 
RMS per pixel. Use the same signal model as before, but 
with an offset common to all pixels, 

T v {p) = m5 v ^+A{p)\ — \ . (31) 

We adopt the spat ially varying synchrotron model of 
iGiardino et all (|2002) as a template for the amplitude 
and spectral index of the signal component 

We simulated a new data set, and computed the 
marginal monopole posterior by direct integration. This 
is straightforward because, for a given value of to, the 
conditional amplitude-spectral index posterior reduces to 
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Fig. 3. — Comparison of the marginal offset posterior for a 
uniform (dashed line) and J effre ys' ignorance (solid line) prior on 
the spectral index /3. See i|4,2l for a full discussion of both the 
model and details of the prior. 



a product of single-pixel distributions. The integration 
therefore goes over a sum of N p i x two-dimensional grids, 
rather than a single 2 pU grid. 

The result is shown as a dashed curve in Figure [3J 
Two points are noteworthy. First, the marginal distri- 
bution is nearly Gaussian, in contrast to the strongly 
non-Gaussian single-pixel posterior shown in the bottom 
panel of Figure Thus, the additional data seem to 
have broken the degeneracy. Second, however, the distri- 
bution has a mean and standard deviation of —1.8 ± 0.4, 
more than 4er away from zero! Repeated experiments 
with different noise seeds gave similar results. 

This behaviour is a result of the choice of prior. We 
initially adopted a uniform prior on the offset, the ampli- 
tudes and the spectral indices, with little thought t o why 
we should do so. This was a poor choice. Jeff reys! (|1961h 
argued that when nothing is known about a particular 
parameter, one ought to adopt a prior that does not im- 
plicitly prefer a given value over another, relative to the 
likelihood. This is not in general the uniform prior. 

Jeffreys argued that the appropriate ignorance prior is 
given by the square root of the Fisher information mea- 
sure, 




(32) 



where the angle brackets indicate an ensemble average. 
This prior ensures that no parameter region is preferred 
based on the parametrization of the likeliho od alone; it 
is the refore a proper ignorance prior (e.g., Box fc Tiaol 
[1991) . 

The log-likelihood corresponding to the model defined 
in Equation 1301 reads 



d v - m5 u , Ul - A ) 



(33) 
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Fig. 4. — Jeffrey s' ig norance prior for the spectral index f}, de- 
fined by Equation 1311 Steep indices are given less weight than 
shallow ones to compensate for their smaller overall impact on the 
likelihood. 



Jeffreys' priors for the three parameters are 

PM) ~ 1. 

Pj(m) ~ 1, 



ft 08) 




(34) 
(35) 

(36) 



Computing the second derivatives of this expression with 
respect to A, m, and /?, we find that the appropriate 



respectively. In general, the ignorance prior for any linear 
parameter in a Gaussian model is uniform because the 
second derivative of the likelihood is constant. However, 
for non-linear parameters greater care is warranted. 

Figure [4] shows Jeffreys' prior for the spectral index /3, 
limited to —4 < (3 < —2. (3 — —2 is given about two and 
half times more weight than j3 = —4. Intuitively, this is 
necessary because there is an asymmetry between a steep 
and a shallow spectrum: A steep spectrum means that 
the signal dies off quickly with frequency, while a shallow 
spectrum implies that it maintains its strength longer. 
Thus, there is a larger allowed parameter volume with 
steep indices than with shallow, leading to an imbalance 
in terms of marginal probabilities. This parametrization 
effect is countered by the Jeffreys' prior. 

The solid curve in Figure [3] shows the result of us- 
ing Jeffreys' prior instead of a uniform prior. Similar 
behaviour is observed independent of noise realization. 
The conclusion is clear: a proper ignorance prior leads 
to unbiased estimates, while a naive uniform prior leads 
to biased estimates. 

In addition to this basic ignorance prior, it may be ben- 
eficial to adopt physical priors, based on knowledge from 
other experiments. For example, if one had reason to 
expect that the dominant signal in a given data set were 
Galactic synchrotron emission, a reasonable prior could 
be (3 = — 3.0±0.3, based on low frequency measurements. 
The physical prior is multiplied by the ignorance prior, 
taking account of both effects. In the rest of the paper, 
when we say that a Gaussian prior is adopted for the 
spectral indices, we mean a product of a Gaussian and 
Jeffreys' prior. 

4.3. Marginalization over high- dimensional and 
degenerate posteriors 
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Fig. 5. — Marginal free-free template amplitude poste riors for 
various priors on the synchrotron spectral index. See £|4.3l for a full 
discussion of this case. 

The previous section shows that given sufficient data 
and an appropriate prior, the marginal posterior is a good 
estimator of the target parameter. In this section we in- 
vestigate what happens when the data are not sufficiently 
strong to break a degeneracy. We replace the single- 
channel offset to by a template amplitude b coupled to 
a fixed free-free template iff(p) and a spectral index of 
/fe=-2.15, 



T v (p)=bt&(p) - 



-2.15 



A(P) ( V - 



(37) 



Two modifications are made to the simulation. First, 
the spectral index of the synchrotron component is fixed 
to (3 S — —3, rather than being spatially varying. Sec- 
ond, a fifth frequency channel is added at 143 GHz. 
No free-free component is added to the data; the op- 
timal template amplitude value is zero. The question is 
whether these data are sufficient to distinguish between 
synchrotron and free-free emission with similar spectral 
indices of —3 and —2.15, respectively. 

The answer is no. Figured] shows the marginal tem- 
plate amplitude posteriors, computed by direct integra- 
tion as in the previous section. The different curves cor- 
respond to different Gaussian priors imposed on the syn- 
chrotron spectral index. All are centered on the true 
value —3, but with different standard deviations A/3 S . 
With the strong prior of A/3 S = 0.01, the amplitude pos- 
terior is well-centered near the true value of zero. How- 
ever, when the prior is gradually relaxed, the marginal 
posterior widens and drifts away from the true value. 
The marginal posterior is not a useful estimator for the 
template amplitude in this case. 

This behaviour is explained by the fact that with 3072 
independent pixels the contribution of noise to the off- 
set amplitude is insignificant compared to the uncer- 
tainty introduced by coupling to the synchrotron com- 
ponent. Moreover, the amplitude and spectral index 
distributions are similar for the two foreground compo- 
nents. As a result, the joint distribution becomes long, 
narrow and curved, like that in the top middle panel 
of Figure^ The marginal one-dimensional posteriors are 
dominated by the "boomerang wing" orthogonal to the 
parameter axis. Similarly, the "wing" parallel to the axis 
is diluted. Given sufficiently strong degeneracies, the 



marginal distributions no longer contain the maximum- 
likelihood point within their, say, 3cr confidence regions. 
When the prior is made increasingly tight, however, the 
wings of the distribution are gradually cut off and the 
marginal distribution homes in on the true value. Thus, 
the collection of distributions shown in Figure [5] in some 
sense visualizes the joint posterior. 

This behaviour may be quantified by means of the co- 
variance matrix of the Gaussian amplitude part of the 
system, defined in Equation 1191 A useful quantity de- 
scribing this matrix is its condition number, the ratio of 
its largest and smallest eigenvalues. For the particular 
case discussed above, we find that the condition number 
is 4 x 10 6 , which, although tractable in terms of numer- 
ical precision for double precision numbers 7 , indicates a 
very strong degeneracy. 

4.4. The offset vs. amplitude degeneracy for full-sky 

data 

The final example we consider before turning to real- 
istic simulations is the same as in £14.21 except we allow 
a free offset for all frequency bands, not just one. (This 
is characteristic of real experiments, which do not know 
the absolute zero-point at any frequency.) The model is 



T v = m v + A(p) — 



0(p) 



(38) 



If the spectral index is constant over the sky, (3{p) = (3, 
this is a perfectly degenerate model: 



(39) 



T v = m v + A{p) — 



\ / \ 

= (m v + Sm[^-) ) + (A(p) - 5m) ( ^ J (40) 



= m' u+ A'(p)[- 







(41) 



One can simply add a constant to the foreground ampli- 
tude, and subtract a correspondingly scaled value from 
each offset. It is thus impossible to determine individu- 
ally the absolute zero-level of foreground component and 
offsets. To obtain physically relevant results, external 
information must be imposed. 

Spatial variations in the spectral index partial ly re- 
solve this degeneracy. In the iGiardino et alj (|2002D syn- 
chrotron model the spectral index (3q varies smoothly on 
the sky between —2.5 and —3.2. The condition number 
of the foreground amplitude-offset covariance matrix is 
2 x 10 7 , and the covariance matrix is no longer singu- 
lar. A modified index model with ten times smaller fluc- 
tuations but the same mean ((3(p) — 0.9 ((3q) + 0.1/3q) 
increases the condition number by two orders of magni- 
tude, to 2 x 10 9 . 

These strong degeneracies lead to the same quantita- 
tive behaviour as seen for the marginal free-free tem- 
plate amplitude posterior in the previous section, mak- 
ing it very difficult to estimate both all offsets and the 

7 The absolute limit on the condition number for reliable matrix 
inversion is 10" 6 for single-precision arithmetic and 10 12 for dou- 
ble precision. However, in practice one should stay well below these 
values, in particular for iterative applications since small numerical 
errors may propagate in an uncontrolled manner. 
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foreground amplitude zero-level individually. In practice, 
external constraints are required. We have implemented 
two approaches for dealing with this degeneracy in our 
code, both based on the projection operator described in 

The first and more direct approach is to assume that 
the offsets of one or more bands are known a-priori by 
external information. For instance, if an experiment 
somehow measured total power, as opposed to differ- 
ences alone, detailed knowledge about the instrument 
itself could be used for these purposes. The advantage 
of this approach is that it is exact, assuming the valid- 
ity of the prior, and the accuracy of all uncertainties is 
maintained. It is implemented simply by demanding that 
'"Ivq — 0, which requires, in terms of the orthogonality 



vectors defined in fc|3.2.1[ q Vji 



A,o- 



The second approach is based on the observation that 
the degeneracy between the foreground amplitude and 
the offsets seen in Equation [41] leads to a very specific 
frequency distribution of offset amplitudes. Specifically, 
m' v = (m v + 6m(v/i/i)P), where Sm is an arbitrary con- 
stant, but common to all frequency bands. It is therefore 
possible to require that the set of offsets should not have 
a frequency spectrum that matches the foreground spec- 
trum. 

The corresponding constraint on m v may be derived 
from 

v,pp' \ J 



m v — Sm — 



(42) 



by first taking the derivative with respect to Sm, and 
then enforcing a vanishing foreground component, Sm — 
0, 



p,p> 



0{p') 







(43) 



The expression in brackets says that the offsets should be 
orthogonal to the mean noise-weighted foreground spec- 
trum. 

If the total signal model includes more than one sig- 
nal component with a free amplitude at each pixel, then 
these should be included jointly in the above \ 2 ■ A par- 
ticularly important case is that including both a CMB 
signal, which has a frequency-independent spectrum, and 
a proper foreground component. For this case, we have 
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Fig. 6. — Verification of the CMB power spectrum distributions 
produced by Gibbs sampling. Black lines show analytically com- 
puted slices through the joint likelihood C(Ci), and red lines shows 
the same computed by Gibbs sampling. Vertical lines show the the- 
oretical input spectrum used to generate the simulation (solid lines) 
and the realization specific spectrum (dashed lines), respectively. 

While this orthogonality constraint is effective for es- 
timating the absolute zero-level of the foreground com- 
ponent in question, it corresponds to a strong implicit 
prior that is not likely to be compatible with reality. If 
there are indeed random offsets at all frequencies, some 
fractional combination of these offsets will mimic a fore- 
ground component. In the above approach, this compo- 
nent is defined to be a foreground signal, rather than an 
offset. Further, no mixing between the two components 
is allowed. Thus, the estimated error bars on both the 
offsets and foreground zero-level will be underestimated. 

Recall, however, that this entire discussion concerns 
the relative contributions to the foreground zero-level 
and the free offsets, not the CMB signal, which relies 
on the sum of the two components alone. The fact 
that the estimated error in the foreground zero-level 
is under-estimated by a small factor, say, four or five 
(o"cst ~ 0.5 fiK vs. dtruc ~ 2 /iK; see the simulation de- 
scribed in Sj6]), is of no consequence for most applications. 
Far more important is the fact that this approach pro- 
vides excellent estimates of both the CMB sky signal 
and the spectral index distribution, the two quantities 
where most of the physics lie. This is in sharp contrast 
to the method employed by the WMAP team, which is 
based on a co-secant fit to a plane-parallel G alaxy model 
(|Bennett et al.ll2003bt iHinshaw et al.ll2007f) . While that 
specific approach is prone to severe modelling errors be- 
cause of its lack of detailed foreground modelling, the 
current approach is internally consistent with respect to 
all signal components. For more discussion on this is- 
sue, see Appendix |C| as well as the actual ana lysis of the 
3-yr data presented by lEriksen et al.l (|2007d ) . In that 
analysis, a common offset of ~ — 13 /j,K is detected in all 
frequency bands, as well as a significant residual dipole 
in the V-band data. 



where mo is the additional degree of freedom introduced 
by the CMB signal. The equivalent constraint on m v de- 
rived from this expression is notationally more involved 
(see Appendix [Cl for a full derivation and constraints), 
but may be written as before in terms a set of orthogo- 
nality vectors q\, . 



4.5. Summary 

The above discussion may be summarized by the fol- 
lowing observations: 

• The marginal mean is a good estimator only for 
mildly degenerate and non-Gaussian joint distri- 
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butions. Strongly degenerate models should be 
avoided, because they are difficult to summarize by 
simple statistics, and because it takes a prohibitive 
number of samples to fully explore them. 

• The uniform prior is a proper ignorance prior for 
Gaussian variables only. In general, Jeffreys' rule 
should be used in the absence of informative priors. 

• For experiments with unknown offsets at each fre- 
quency band, there is a strong degeneracy between 
these offsets and the overall zero-level of the fore- 
ground amplitudes. This degeneracy should be 
broken by external or internal priors, if marginal 
posteriors are to be used as estimators. 

5. CODE VERIFICATION 

In §|4] we considered simple toy models to develop in- 
tuition about the target distributions. We used analyt- 
ical, brute-force computations to avoid the complexities 
of real-world computer code. In this section, we turn 
our attention to Commander, our implementation of the 
joint foreground-CMB Gibbs sampler described in £|3j 

Three conditional distributions are involved in this 
joint Gibbs sampler, namely the CMB power spec- 
trum distribution P(Ci\s), the amplitude distribution 
P(s, a u ,i, bj,Ck\Ci, 6k, d), and the spectral parameter dis- 
tribution P(8k\s, a u ,i, bj, Ck, d). In the following three 
subsections, we test the output from Commander for 
these three conditional distributions against analytical 
expressions, at low resolution, to verify both the general 
sampling algorithms and our specific implementation. 

5.1. The CMB power spectrum sampler 

To verify the CMB power spectrum distribution 
P(Cg\s), we construct a low-resolution CMB-only sim- 
ulation as follows. Draw a random C MB realization 
from a standard ACDM power spectrum (|Spergel et al.l 
120071) . smooth to 10° FWHM, and pixelize at N side = 16. 
Add white noise of 1 li K RMS to each pixel . Impose 
the WMAP Kp2 sky cut ([Bennett et alJl2003ti ). without 
point sources and downgraded to N p i x = 16, on the data. 

We compute slices through the corresponding like- 
lihood by considering each I individually, fixing all 
other multipoles at the input power spe ctrum, with a 
brute- force calculation in pixel-space (e.g. jEriksen et all 
l2007a| ). and with Commander. The outputs from 
the latter are sm oothed through Rao-Blackwcllization 
(|Chu et al.ll2005l) to reduce Monte Carlo errors. 

Figure[6] shows the results for four multipoles. The the- 
oretical input spectrum is shown by vertical solid lines, 
and the true realization spectrum by dashed lines. Com- 
mander reproduces the CMB power spectrum distribu- 
tions perfectly. 

5.2. The Gaussian amplitude sampler 

To verify the amplitude distribution 
P(s, a v ^, bj, Ck\Ct, 6k, d), we construct a simulation 
at iV si dc = 8 (768 independent pixels, angular resolution 
20° FWHM). The CMB realization is the same as in 
the previous section, appropriately smoothed. Five 
frequency channels are simulated, corresponding to 
the five WMAP channels. In addition to the CMB 
sky signal, s, we add a synchrotron signal, c(p) with 




Spherical harmonic coefficient, a [Q _ (u\K) K-band offset (u\K) 
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Synchrotron amplitude at K-band (uK) Dust template amplitude at W-band 

Fig. 7. — Verification of Gaussian amplitude sampler. Analytical 
marginal posteriors are shown by smooth distributions, and results 
from Commander are shown by histograms. The true values are 
indicated by vertical dashed lines. 

a spatially varying spectral index, a dust template 
with an amplitude, 6, scaled to unity at W-band, 
and a, ao — — 10/xK monopole to the K-band. (Sec 
foreground description in Section 16.11 for further details 
on this model.) Thus, all four types of amplitudes are 
represented. White noise of 1 /iK RMS is added to each 
pixel at each frequency. 

We fix the CMB power spectrum and synchrotron spec- 
tral index map, and compute the joint Gaussian ampli- 
tude distribution both analytically and with Comman- 
der. The analytical computation is performed by direct 
evaluation of the mean x and covariance matrix A de- 
fined by Equations [19] and [20] The marginal variances of 
each parameter are given by the diagonal elements of A. 

Figure [7] shows the marginal distributions for one pa- 
rameter of each type. Again, Commander reproduces the 
exact analytical result perfectly. 

5.3. The spectral index sampler 

To verify the spectral index sampler for 
d), a single-pixel distribution, we 
simulate a single pixel. The signal model is identical 
to that in i )4.1l comprising a synchrotron component 
with unknown amplitude and spectral index, plus an 
unknown offset at the lowest frequency. 

We compute the corresponding three-dimensional joint 
posterior by direct grid evaluation and by Commander. 
Figure[8]shows the corresponding marginal distributions. 
Again we find perfect agreement. 

All conditional distributions currently implemented in 
Commander have thus been verified. 

6. APPLICATION TO SIMULATED 3-YR WMAP DATA 

We turn now to a more realistic simulation, with prop- 
erties corresponding to the 3-yr WMAP data. The sim- 
ulation has two goals. First, to show that the method 
can handle data with realistic complexities, and that it 
is applicable to the current WMAP data and (even more 
importantly) the up-coming Planck data. Second, to pro- 
vide the necessary background for understanding the re- 
sults from the actual 3-yr WMAP analysis presented by 
lEriksen et all (|2007d) . 
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Fig. 8. — Comparison of marginal posteriors for a single pixel, 
computed both analytically (thick, dashed curve) and with Com- 
mander (thin line) . The true value is indicated by a vertical dotted 
line. 



6.1. Simulation, model and priors 

We construct the simulation as follows. Draw a CMB 
sky realizatio n from the bes t -fit A CDM power spectrum 
presented by ISpergel et al.1 (|2007j ) . Convolve with the 
each of t he beams of the ten differencing assemblies of 
WMAP (jBennett et al.ll2003al ). pixelized at a HEALPix 
resolution of N s id c = 512. Add white noise to each map 
with standard deviation a(p) = o-q I y A^bs (p) , where 
iVpi x is the number of observations at pixel p (provided 
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Fig. 9. — Top panel: CMB (black lines) and noise power spectra 
(blue and red lines) for the 3-yr WMAP V-band channel. Solid 
lines show spectra for the 3° FWHM data set, and dashed lines 
show spectra at the native resolution of the V-band data. Bot- 
tom panel: CMB signal to noise ratio for instrumental (blue) and 
regularization (red) noise, respectively. 

on Lambda 8 together with the actual sky maps). 

Downgrade these ten maps to a common resolution 
of 3° FWHM and N pix = 64, bandlimiting each map 
at ^max = 150. Create frequency maps by co-adding 
differencing assembly maps at the same frequency (e.g., 
Q = (Ql+Q2)/2). Add uniform white noise of 2//K RMS 
to each frequency map to regularize the noise covariancc 
matrix. 

Figure [5] shows the CMB and noise spectra of the co- 
added V-band data, both at the native resolution of the 
frequency band (dashed lines) and at the common 3° 
FWHM resolution (solid lines). The CMB to regulariza- 
tion noise ratio is unity at £ ~ 120, and less than 2% 
at £ max = 150. Both the instrumental and the regular- 
ization signal-to- noise ratios are > 500 at £ < 50, and 
therefore negligible at these scales compared to cosmic 
variance. 

Instrumental noise averaged over the full sky is larger 
than the regularization noise everywhere below £ < 80. 
In the ecliptic plane, where the instrumental RMS is 
about a factor of two larger than the full-sky average 
due to WMAP's scanning strategy, it dominates below 
£ < 100. The result of this unmodcllcd noise term is, 
as we will see later, a somewhat high pixel-by-pixel % 2 
in the ecliptic plane. However, since this error term is 
correlated only on very small scales (the beam size of 3° 
FWHM), and we understand its origin and benign be- 
haviour, it does not represent a significant problem for 
the analysis. With additional years of WMAP observa- 
tions, and the addition of the Planck data, this noise 
contribution will be further suppressed. Further, we will 
consider in the future various approaches for taking this 
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term into account, for instance by computing explicitly 
the corresponding sparse covariance matrix. 

Our foreground model has three components, syn- 
chrotron, free-free, and thermal dust emission. For syn- 
chrotron emission, where the spectral index is known to 
vary substantially with position on the sky , we extrapo- 
late the 408 MHz map (Has lam et al.lll982l ) using a map 
of the spectral index for each pixel. For the latter we use 
an updated version of the Giardino et al. (2002) spec- 
tral index map that is based on 408 MHz and WMAP 
23GHz data, after removing the free-free emission via the 
WMAP MEM free-freemodel (Bennett et al. 2003b) 9 . 
The free-fr e e mo del is defined by the H a template of 
IFinkbeiner! (|2003f ). scaled to 23 GHz assuming an elec- 
tron temperature of T e = 4000 K and a spatially constant 
spectral in dex of (3fr = —2.15 . The dust model is based on 
model 8 of IFinkbeiner et alj (|1999l ). evaluated at 94 GHz 
and scaled to other frequencies using a single-component 
modified black-body spectrum with Td = 18.1 K and an 
emissivity index of a = 2.0. Anomalous dust is ignored 
in this analysis. 

Guided by the results of lEriksen et al.1 (|2007cD . we add 
a common offset to all frequencies of —13 /xK. No dipole 
contributions are added to the simulations. 

For the power spectrum analysis in £16.41 we analyze 
the same realization with and without foregrounds, but 
with the same sky cut. This allows us to distinguish 
between sky-cut and foreground-induced effects. 

We a dopt the same param etric signal model as that 
used bv lEriksen et al.l (|2007d ). 



T u (p) = s(p)+m° v + 



t{p)a{v) 



i=l 



n(p)] 



.d List 



1.7" 



+ f(p)a(u) { -1 



P(p) 



(45) 

The first term is the CMB sky signal. The second and 
third terms are the monopole m° and three dipole com- 
ponents m z defined by standard Cartesian basis vectors. 
The fourth term is a dust tracer, based on the FDS tem- 
plate coupled to a fixed spectral index of (3^ — 1.7, and 
a free overall amplitude b. The postulated power-law 
spectrum does not match the modified black-body spec- 
trum used to create the simulation, and modelling errors 
are therefore to be expected. The fifth term is a single 
low-frequency foreground component with a free ampli- 
tude f(p) and spectral index (3{p) at each pixel p. The 
antenna-to-thermodynamic differential temperature con- 
version factor is a(y), as always. 

In addition to the previously described Jeffreys' prior, 
we adopt a prior of (3 = —3 ± 0.3 for the low-frequency 
foreground spectral index, assuming that the foreground 
signal is synchrotron emission unless the data require 
otherwise. This is not a particularly strong prior: The 
free-free spectral index of fig — —2.15 is only 2.8a away 
from the prior mean, and it does not take a large free- 
free amplitude to overcome this. For instance, near the 
galactic plane the standard deviation of the marginal in- 
dex posterior is ~ 0.01, thirty times smaller than the 

9 These models were produced as part of the development of 
the Planck Sky Model, under the coordination of Planck Working 
Group 2. 



prior width. At high latitudes, on the other hand, the 
synchrotron spectral index is for all practical purposes 
unconstrained. The prior prevents this component from 
interfering with the CMB signal in regions where its am- 
plitude is low. 

We impose the orthogonality constraint discussed in 
£14.41 to break the degeneracy between the free monopoles 
and dipoles at each band, and the foreground zero-level 
and dipole. An important goal in the following is to see 
whether this approach yields sensible results. 

With the simulation, model, and priors defined, we 
compute the joint and marginal posteriors using the ma- 
chinery described earlier in the paper. The wall-clock 
time for generating one single sample is ~ 50 s, paral- 
lelized over five 2.6 GHz AMD Opteron 2218 processors, 
one for each frequency band. We generate five chains 
with 1000 samples each, for a total wall clock time of 
14 hr. The total computational cost is 350 CPU hours. 

6.2. Burn-in, correlation lengths and convergence 

We begin our examination of the results by plotting the 
output Markov chains as a function of iteration count 
in Figure [101 Each panel shows the evolution of one 
parameter, such as the CMB power spectrum coefficient 
for a single multipole or the dust template amplitude. 

Burn-in is a crucial issue for Markov chain algorithms. 
The chains were initialized with a random CMB power 
spectrum over-dispersed relative to the true distribution, 
and the spectral indices of the low-frequency foreground 
component were drawn randomly and uniformly between 
—4 and —2. The Gibbs sampler needs some time to con- 
verge to the equilibrium distribution; as we see in Fig- 
ure [TQl about 200 iterations are required to reach the 
equilibrium state. 

The last parameters to equilibrate are the global 
monopole and dust amplitudes, because the uncertainty 
in these very high signal-to-noise parameters is very 
small, and only small steps can be made between con- 
secutive Gibbs samples. Moreover, since these are global 
parameters, they couple to all other parameters. 

The x 2 trace plot shows an interesting feature. After 
reaching a minimum \ 2 solution after about 100 itera- 
tions, the chain stabilizes at a very slightly higher equi- 
librium value. This is due to the fact that the full dis- 
tribution consists not only of the sky signal components, 
but also the CMB power spectrum. Maximizing the total 
joint posterior value is therefore a compromise between 
minimizing the sky signal \ 2 an d optimizing the CMB 
power spectrum posterior. At iteration number 100, the 
CMB component is still burning in, whereas the fore- 
ground amplitude, the single most important parameter 
in terms of x 2 j has already reached its equilibrium. The 
Markov chain thus overshoots in y 2 minimization until 
the CMB power spectrum equilibrates. 

Correlation length is a second crucial issue for Markov 
chain algorithms. In general, classic Metropolis-Hastings 
algorithms have a long correlation length because they 
propose relatively small modifications at each iteration 
in order to maintain high acceptance probability. The 
Gibbs sampler works differently Because it samples from 
exact conditional distributions, large jumps are perfectly 
feasible, at least in the absence of strong conditional cor- 
relations. (In the present case there are no such strong 
correlations.) The CMB power spectrum and CMB sky 
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Fig. 10. — Trace plots showing the evolution of the Gibbs chains 
as a function of iteration count, for each type of parameter. The 
true input values are indicated by a horizontal dashed line, where 
applicable. The sky map type parameters are thinned by a factor 
of ten to reduce disk space requirements. 

signal are only weakly correlated in the high signal-to- 
noise regime, and the foreground spectral index couples 
only moderately strongly to the foreground amplitude of 
the same pixel, and weakly to anything else. The re- 
sult is excellent mixing properties and short correlation 
lengths. 

This translates into a high sampling efficiency and 
a relatively small number of samples required for con- 
vergence. To quantify this, we adopt the widel y 
used Gelman- Rubin R statistic (Gel man fc Rubinlll992f ). 
which is the ratio between two variance estimates. If the 
Markov chains have converged, the two estimates should 
agree, and their ratio, R, should be close to unity. A 



TABLE 1 

MONOPOLE AND DIPOLE POSTERIOR STATISTICS 





Monopolc 


Dipole X 


Dipole Y 


Dipole Z 


Band 


( M K) 


OiK) 


( M K) 


( M K) 


K-band 


-12.4 ±0.5 


-0.6 ±0.7 


-0.4 ± 0.5 


-0.1 ±0.1 


Ka-band 


-10.3 ±0.5 


-1.2 ± 0.7 


-0.7 ±0.5 


-0.2 ±0.1 


Q-band 


-10.8 ±0.5 


-1.1 ± 0.7 


-0.7 ±0.5 


-0.1±0.1 


V-band 


-12.3 ±0.5 


-0.6 ± 0.7 


-0.6 ±0.5 


-0.1±0.1 


W-band 


-12.8 ±0.5 


-0.3 ± 0.7 


-0.2 ±0.5 


-0.1 ± 0.1 



Note. — Means and standard deviations of the marginal 
monopole and dipole posteriors. 

typical recommendation is that R should be less than 1.1 
to claim convergence, given that the chains were initially 
over-dispersed, although smaller numbers are clearly bet- 
ter. 

Computing this statistic for the five chains above, while 
discarding the first 200 samples, we find that R is less 
than 1.01 for the CMB power spectrum up to £ ~ 100, 
less than 1.05 for the both the CMB pixel and foreground 
amplitudes all over the sky, and less than 1.01 for the 
template amplitudes. Thus, even with such a relatively 
modest number as 4000 samples, excellent convergence 
has been reached on all marginal statistics. We return to 
the question of joint convergence of the CMB spectrum 
posterior in §6.41 

6.3. Component separation results 

We now turn to the marginal distributions of the es- 
timated signal parameters, and focus first on the sig- 
nal components. The CMB power spectrum is discussed 
separately in the next section. The sky map results are 
summarized in Figure [11] in terms of the marginal pos- 
terior means, standard deviations, and differences be- 
tween the posterior means and the input maps. Table [T] 
lists the monopole and dipole results. The dust tem- 
plate amplitude posterior mean and standard deviation 
is b = 1.013 ±0.002. 

Considering first the left column in Figure II H we see 
that the three sky map reconstructions are visually com- 
pelling. No obvious foreground residuals are observed 
in the CMB map, familiar structures such as the North 
Galactic Spur and Gum Nebula are seen in the fore- 
ground amplitude map, and the spectral index map dis- 
tinguishes clearly between the known synchrotron and 
free-free regions. 

These visual considerations are quantified in the right 
column, where the input maps has been subtracted from 
the posterior means 10 . We see that the CMB map has 
residuals at the ~ 3 fiK RMS level, with a peak-to-peak 
amplitude of ±8 /iK. Little of these residuals is corre- 
lated on the sky except for a few patches near the galactic 
plane. Most of the differences are simply due to instru- 
mental noise. 

For the foreground amplitude, more distinct correlated 
patches are seen, in particular in regions with strong free- 
free emission. This is due to the fact that a single power 
law is not a sufficiently good approximation to the sum 
of the free-free and synchrotron components, relative to 
the statistical uncertainty. 

10 For the foreground amplitude, the input map was estimated 
by fitting a single power law to the sum of the synchrotron and 
free-free components. 
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Fig. 11. — Marginal posterior sky signals from the WMAP simulation. The left column shows the posterior mean for each pixel, the 
middle column shows the posterior standard deviation, and the right column shows the difference between the estimated posterior mean 
and the known input signal. From top to bottom, the rows show 1) the CMB solution; 2) the low-frequency foreground amplitude solution; 
and 3) the low-frequency spectral index solution. 
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Fig. 12. — Mean \ 2 map computed over posterior samples. A 
value of x 2 = 15 corresponds to a \ 2 that is high at the 99% 
significance level. 



Finally, even the the spectral index difference map 
shows clearly correlated regions, and additionally a neg- 
ative bias of about —0.1. This bias is primarily due to 
two effects. First, as reported at the beginning of this 
section, the dust template amplitude is over-estimated 
by 1-2%, mainly because of mis-specification of the dust 
spectrum. As a result, slightly too much signal is sub- 
tracted from the higher frequency channels, and this in 
turn steepens the spectral index of the remaining signal. 
Second, at high latitudes the data are noise dominated, 
and the f3 = —3 ± 0.3 prior becomes active. Because the 
true signal has an average of ~ —2.9 at high latitudes, a 
bias of ~ —0.1 results. 

Comparing the actual difference maps with the esti- 
mated errors shown in the middle column of Figure 111! 
we see that the errors of the CMB and foreground am- 
plitudes are underestimated by a factor of ~ 1.5 to 2. 



(These plots are typically scaled to a dynamical range 
of ~ ±3(7. The expected peak-to-peak range in a differ- 
ence plot is therefore roughly three times the RMS er- 
ror.) This is due to modelling errors in two forms. First 
and foremost, we neglected the smoothed instrumental 
noise in our data model, and this causes a significant un- 
modelled uncertainty at the smoothing scale. However, 
being random with zero mean, it does not induce sig- 
nificant structure on larger scales, and it therefore has 
negligible impact on the scales of cosmological interest 
(I < 30). Second, the foreground model is simplified 
compared to the input, as we approximate the sum of 
two power law components by a single power law, and 
also assume a simple power-law dust spectrum while the 
input sky has a modified black-body spectrum. Com- 
bined, these effects introduce errors not captured by the 
estimated statistical uncertainties. 

Table [1] lists the posterior mean and standard devia- 
tions of the monopole and dipole coefficients. Recall that 
the input parameters in the two cases were —13 f^K and 
/iK, respectively. In general, these values are recon- 
structed reasonably well, although the error estimates 
are somewhat underestimated for the Ka- and Q-band 
monopoles. We see that the orthogonality constraint de- 
scribed in §4.41 is quite effective, producing a good esti- 
mate of all quantities of interest. On the other hand, it 
does have the effect of artificially reducing the error bars 
on the template amplitudes somewhat, and also correlat- 
ing them. However, misestimation of the monopole error 
estimates by a few microkelvin is a small price to pay 
for an absolute estimate of the foreground amplitudes to 
within a few percent. 

The features seen in the CMB RMS map may be under- 
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stood qualitatively in terms of the above results. First, 
the most dominating structure is a hot-spot centered on 
the Galactic plane. This is mainly due to the coupling 
between the galactic foregrounds and the x-component 
of the dipole. Because of the large foreground signal 
in this direction, it is hard to estimate the correspond- 
ing dipole component (see Table [I} , and this transfers 
uncertainty from low to high latitudes. Note, however, 
that this particular component has a very specific corre- 
lation structure on the sky, which is taken implicitly into 
account by the algorithm; this uncertainty does there- 
fore not significantly affect high-£ modes in the power 
spectrum, even though it looks visually dominating in a 
marginal RMS map. Second, the masked Galactic plane 
has a very high uncertainty, although not infinite; the 
requirement of isotropy implies that the modes inside 
this plane is to some extent restricted, at least on large 
angular scales. Finally, as expected there is a (weaker) 
correlation between the foreground amplitude and spec- 
tral index maps and the CMB RMS map. 

FigurefTJlshows the average \ 2 computed over the 4000 
accepted samples. A value of 15 in this plot corresponds 
to a model that is excluded at the 99% confidence level. 
Two points are worth noticing in this plot. First, the 
ecliptic plane stands out with higher x 2 values. As de- 
scribed above, this is due to the unmodelled, smoothed 
instrumental noise. At a smoothing scale of 3° FWHM, 
this component is not fully negligible for the 3-yr WMAP 
data relative to the CMB signal, and therefore causes a 
slight bias at the smallest scales, i > 100. However, we 
are mainly interested in i < 50, and in this range the in- 
strumental signal-to-noise ratio exceeds 100 everywhere. 
This term does not affect the CMB signal of primary 
interest. 

The actual foreground-induced modelling errors are 
very small. Indeed, despite the fact that we approximate 
the sum of two different power laws with a single com- 
ponent, and assume an incorrect dust spectrum, the % 2 
distribution is essentially perfect near the ecliptic poles, 
and the residuals are very small even close to the Galactic 
plane. 

However, the \ 2 for the global solution as a whole is 
somewhat poor, with a reduced \ 2 °f 1-68. This large 
value is largely dominated by unmodelled smoothed in- 
strumental noise in the ecliptic plane, as discussed above; 
when analyzing the same data set at a smoothing scale of 
6°, rather than 3°, we found a reduced \ 2 ~ l-l- Thus, 
when using the products from this analysis in subsequent 
studies (e.g., for cosmological parameter estimation), it 
is important to include only those scales that are unaf- 
fected by the degradation process itself." 

In summary, the overall results are very promising in- 
deed. The CMB sky signal is reconstructed to within 
a few percent everywhere, as is the foreground ampli- 
tude. Further, the spectral indices are accurate to the 
~ 0.1 level wherever there is a significant signal, and the 
monopole and dipole coefficients are very close to the 
true values. Finally, even the reconstructed dust tem- 
plate amplitude is correct to within 2%. 

The results are slightly more mixed when it comes to 
estimation of uncertainties. For components with a rel- 
atively large intrinsic uncertainty, such as the CMB sky 
signal and foreground spectral index, the error estimates 
are quite reasonable. On the other hand, for parameters 
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Fig. 13. — Top panel: Comparison of the posterior mean real- 
ization power spectra, o^, 1) as computed on the full sky (i.e., true 
realization spectrum; red line); 2) as computed on the cut sky, but 
not including foregrounds either in the simulation or in the model 
(blue line); and 3) as computed with Commander on the full simu- 
lation, including the foreground complexity (black line). The gray 
area indicates the lcr confidence region for the case that includes 
foregrounds. Bottom panel: Difference between the spectra shown 
above; blue line shows the difference with and without foregrounds, 
and the red line shows the difference between the spectra including 
foregrounds and a sky cut and the true spectrum . 

with a high intrinsic signal-to-noise ratio, most notice- 
ably the dust template amplitude, the errors are clearly 
under-estimated because of significant modelling errors. 
(We note that analysis of simulations with a foreground 
composition and priors that matches the assumed model 
yields, as expected from the results shown in Section [51 
both point estimates and uncertainties in agreement with 
expectations.) 

The remaining and key issue is what the impact of 
these residuals and increased uncertainties are on the 
CMB power spectrum and cosmological parameters at 
£ < 30. This is the topic of the next section. 

6.4. CMB power spectrum and cosmological parameters 

We now consider the CMB power spectrum posterior 
with the goal of understanding the impact of both resid- 
ual foregrounds and error propagation on the final re- 
sults. To do so, we consider both the identical simula- 
tion described in the previous section and a similar one 
in terms of instrumental properties and sky coverage, but 
excluding foregrounds both from the simulated data and 
the model. Comparing the two against each other allows 
us to disentangle the effects of foregrounds and sky cut. 

The top panel of Figure [13] shows the posterior mean 
realization specific spectrum, for three different cases. 
First, the true full-sky spectrum is plotted as a red line. 
Second, the cut-sky but CMB-only spectrum is shown 
as blue curve, and finally, the cut-sky and "foreground- 
contaminated" spectrum is shown as a black curve. The 
lcr confidence region about the latter is marked as a gray 
region. The bottom panel shows the difference between 
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Multipole moment, / 

Fig. 14. — Top panel: Contributions to the total power spec- 
trum uncertainty from cosmic variance (dotted line), sky cut and 
instrumental noise (dashed line) and foregrounds (solid line), re- 
spectively. Bottom panel: Ratio of noise and sky cut errors (dashed 
line) and foreground errors (solid line) to cosmic variance. 

the foreground-contaminated spectrum and the full-sky 
spectrum (red), and difference between the two cut-sky 
spectra (blue). 

In terms of absolute differences, we see that the fore- 
ground errors (blue curve in Figure I13|) are in general 
less than 50 /zK at £ < 30, with a few occasional peaks 
at ~ 100 fiK 2 , and without any striking biases. Already 
at this point, we may thus predict that the absolute ef- 
fect of residual temperature foregrounds on cosmological 
parameters will be small at large angular scales, when us- 
ing the component separation method presented in this 
paper. 

Next, we consider the foreground induced uncertain- 
ties. First we note that if the total CMB spectrum un- 
certainty has been properly estimated, then the full-sky 
difference spectrum (red curve in Figure 1 1 3j> should be 
distributed according to the uncertainties indicated by 
the gray region. Except for some noticeable correlated 
features around £ sa 50, this agreement is quite reason- 
able. Second, the blue curve shows the differences due to 
foregrounds alone. This term should thus be described 
by a corresponding increase in the total uncertainty when 
including foregrounds in the analysis. 

To understand the relative magnitude of these contri- 
butions, it is instructive to compute the relative magni- 
tudes of the errors due to cosmic variance, sky cut and 
instrumental noise, and foregrounds. These can be esti- 
mated from quantities ready at hand. First, the standard 
expression for the cosmic variance is 

2 

Var(cosmic variance) = ^ ^ C g , (46) 

Second, the uncertainty due to the mask and instrumen- 
tal noise alone is given by the variance of cr|, 

Var(mask,noise) = {{a)) 2 - (e^) 2 ), (47) 



where a\ are generated in the analysis without fore- 
grounds. Similarly, the uncertainty due to the combined 
effect of the mask, instrumental noise, and foregrounds 
is given by the same expression but computed from the 
samples that also include foregrounds. To establish an 
order of magnitude approximation of the foreground- 
induced uncertainty alone, we assume that the variances 
add in quadrature, 

Var(fg) = Var (mask, noise, fg) — Var(mask, noise) (48) 

This is of course not strictly correct, because the errors 
in question are quite non-Gaussian, but it is a sufficient 
approximation for our purposes. 

These three functions are shown in the top panel of 
Figure Q3] for the data set described above. In the bot- 
tom panel, we show the ratio of the mask and noise error 
and the foreground error, respectively, to cosmic vari- 
ance. First we note that the foreground error is always 
smaller than the mask and noise induced error, except 
at the very highest £'s, where the estimates are any- 
way not reliable. However, at £ ~ 40 these two are 
almost comparable in magnitude, both at the 25-50% 
level of the cosmic variance. Once again assuming that 
these errors add in quadrature, neglecting a 20% error 
term implies underestimating the full errors by about 
2% (Vl + 0.2 2 -l = 0.02). 

At larger angular scales, we see that the foreground un- 
certainty becomes essentially irrelevant, simply because 
the cosmic variance dominates completely: At £ < 30, 
the relative magnitude of this term is seldom more than 
10% of the cosmic variance, which translates to a relative 
under-estimation of the total error by only 0.5%. This 
implies a somewhat surprising conclusion: Exact fore- 
ground error propagation is not important on the very 
largest scales, below, say, £ < 30, simply because the 
cosmic variance is so totally dominating. However, the 
same does not necessarily hold true on smaller scales. 
At £ w 40, the foreground error increases the total un- 
certainty by several percent, and this is likely to increase 
further to smaller scales 11 . It could constitute a very sig- 
nificant fraction of the total error in the range between 
£ ~ 50-200, depending on the spatial foreground power 
spectrum. 

In Figure 1151 we plot eight slices through the power 
spectrum likelihood between £ — 2 and 50, for the two 
cases that exclude (black curves) and include (red curves) 
foregrounds. In these plots, we see the same behaviour 
as discussed above: At very low £'s, the widths of the dis- 
tributions with and without foregrounds are essentially 
identical. However, at £ = 40 the effect of the foreground 
uncertainties starts to become visible, as the red distri- 
bution is noticeably wider than the black distribution. 
Still, it is also very clear from this figure that the effect 
of neglecting foreground errors in the temperature spec- 
trum at £ < 50 in terms of cosmological parameters will 
be minimal. 

To demonstrate this statement, we define a simple two- 
parameter power spectrum model, 

11 The peculiarly low foreground error at I m 50 may be con- 
nected with the lack of features in the dust template at these an- 
gular scales; we do not believe it is a general feature. 
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Fig. 15. — Slices through the CMB power spectrum likelihood. Black lines show the distributions from the foreground-free simulation, 
and red lines show the same for the simulation that did include foregrounds. The ensemble-averaged spectrum is indicated by vertical 
solid lines, and the true realization-specific spectrum by dashed lines. Note that at very low Ps, the distributions are essentially identical, 
because of cosmic variance domination. At i > 30—40, the additional uncertainty due to foregrounds start to become visible. 

Two conclusions may be drawn from this exercise. 
First, the method presented in this paper is fully ca- 
pable of extracting the valuable cosmological signal from 
the 3-yr WMAP data at large angular scales in quite re- 
alistic simulations, even when using the simplified fore- 
ground model described earlier. Second, the increased 
uncertainty in cosmological parameters due to these fore- 
grounds at I < 30 is negligible. 
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Fig. 16. — Joint posteri or d istributions for the two-parameter 
model defined by Equation 1491 Dashed contours show the results 
for the analysis without foregrounds, and solid contours show the 
same for the analysis with foregrounds. In both cases, the contours 
are where — 2 In P(q, n\d) rises by 0.1, 2.3, 6.17 and 11.8 from its 
minimum value, corresponding (for Gaussian distributions) to the 
peak and the 1, 2, and 3<r confidence regions. The cross marks the 
true input value, (q, n) = (1,0). 

where q is a free overall amplitude, n is a spectral tilt 
parameter, and C™ is the actual theoretical power spec- 
trum used to generate the simulation. We then map out 
the corresponding two-d imensional poste rior using the 
Blackwell-Rao estimator (|Chu et al.l l2005) . The analysis 
is restricted to the range between I = 2 and 30, which 
is the primary target range for our first WMAP analysis 
(jEriksen et al.ll2007cD . 

Figure [16] shows the results in terms of two sets of 
contours. The dashed contours show the results from the 
analysis excluding foregrounds, and the solid contours 
show the results including foregrounds. The true value 
of (q,n) = (1,0) is marked by a cross. The agreement 
between the two sets of results is excellent. 



7. CONCLUSIONS 

We have presented an algorithm for joint component 
separation and CMB power spectrum estimation. This 
algorithm is a natural extension of the CMB Gibb s 
sampler previously deve loped by Uewell et al.l (|2004j) . 
IWandelt et "all (|2004j) and lEriksen et alj (l2004bT) and th e 
foreground sampler described bv lEriksen et all (|2006h . 
The basic product from this algorithm is a set of joint 
samples drawn from the full joint posterior P(CV, s, 6\d), 
where Ce is the CMB power spectrum, s is the CMB 
sky signal, and 9 denotes the set of all parameters in 
the foreground model. With this tool, exact marginal- 
ization over very general foreground models is feasible, 
and proper foreground uncertainties may be propagated 
seamlessly through to the CMB power spectrum and, 
therefore, to cosmological parameters. 

There are some potential pitfalls the user of the method 
needs to be aware of before applying it to real data. In 
particular, one has to pay attention to possible degen- 
eracies in the parametric signal model fitted to the data. 
Such degeneracies are not uncommon in models relevant 
to CMB foreground analysis. Two specific examples are 
synchrotron and free-free emission, with spectral indices 
of (3 S ~ —3 and 0g = —2.15, respectively, and the degen- 
eracy between unknown offsets at all frequency bands 
and the foreground zero-level. In order to obtain reliable 
one-dimensional marginal estimates of each component 
individually, one must either make sure that the data 
have sufficient power to resolve the model, or impose pri- 
ors to break the degeneracies. Fortunately, since only the 
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sum over all foregrounds is relevant to the actual CMB 
reconstruction, these issues are of little concern to the 
cosmological interpretation of the data. 

The primary target of this work is Planck, scheduled 
to be launched in late 2008, and which will observe the 
microwave sky at nine frequencies from 30 to 857 GHz. 
Combined with the five WMAP frequencies, these four- 
teen sky maps will constitute an outstanding data set 
for both cosmological and Galactic studies. Using the 
methods described in this paper, it will be possible to 
constrain three or four foreground components pixcl-by- 
pixel, and even more if adopting spatial templates as 
priors (e.g., the H Q template as a tracer for free- free). 

A more immediate application is the analysis of the 3- 
yr WMAP tem perature data, which is presented in a sep- 
arate Letter by lEriksen et al.l ([2007c). As demonstrated 
in the present paper, the method is fully capable of ex- 
tracting the valuable cosmological signal at large angular 
scales from this data set, both in terms of the CMB power 
spectrum and cosmological parameters. Further, a very 
general foreground model may be constrained to within 
a few percent in all parameters. 

In its present form, the method assumes identical beam 
responses at each frequency band. This limits its appli- 
cation to the lowest angular resolution of a given data 
set. However, this is not a fundamental limitation of the 
method, but only of our current implementation. Specif- 
ically, it is straightforward to rewrite the sampling equa- 
tions presented in S|3]to handle the foreground amplitudes 
in spherical harmonic space, similar to the current treat- 
ment of the CMB sky signal. In that case, convolution 
with individual beams at each frequency poses no prob- 
lem. An added bonus is that one may then optionally 
also either estimate or impose a spatial power spectrum 
on the foregrounds, just as for the CMB component. The 
implementation of this is left for future work. 



Still, as demonstrated in this paper, even with the cur- 
rent implementation we are able to perform a complete 
Bayesian joint CMB and foreground analysis with the 
3-yr WMAP temperature data at £ < 30-50. Simple 
extrapolation with respect to angular resolution and in- 
strumental noise suggests that we will be able to do at 
least as well for Planck up to £ ~ 100-200, beyond which 
other and simpler approaches may be applicable. We will 
quantify these projections in greater detail in an upcom- 
ing publication where we apply the method to state-of- 
the-art Planck-based simulations. An important part of 
this work is to quantify the impact of modelling errors. 

To conclude, given the recent successes of the Gibbs 
sampling approach for analyzing both temperature and 
polarization CMB data, and now also including realis- 
tic foreground modelling, we believe that this method, 
or generalizations thereof, should be the baseline analy- 
sis strategy for Planck at large angular scales. We also 
note that its computational efficiency and unparallelled 
capabilities for propagating systematic uncertainties is a 
combination that will prove extremely valuable to future 
CMB experiments. 
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APPENDIX 

THE TEMPLATE AMPLITUDE COUPLING MATRIX 
described the sampling algorithm for the conditional 



In Section 13.2.11 we 
P(s, a^i, bj, Ck\Ce,9k,d). This involved calculating the joint mean, 



x = A 



£„/»fjN-M 



Gaussian distribution 



(Al) 



and multiplying with the inverse covariance matrix, 

S- 1 + A'N^A 

A~ 1 = ' 



T*N- X A 
F*N _1 A 
G*N- X A 



A t N -i T 
T t N -i T 

F*N _1 T 
G*N _1 T 



A'N _1 F 
T t N -i F 

F t N~ 1 F 
G*N _1 F 



A'N _1 G 
T*N _1 G 
F'N^G 
G'N _1 G 



(A2) 



We now write each element in this matrix explicitly, for the benefit of readers who want to implement the algorithm 
themselves. The elements in the first row of blocks are given by, 



S" 1 + A'NT 1 A = S" 1 + AtN- x A v 



(A3) 



A t N -i T = A t N -i tvJ 
A'N^F^AtN- 1 /^ 

V 



(A4) 
(A5) 

(A6) 
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The upper part of the second row is 



T*N- X T = tljN-H^k 
T'N^G^t^N^g^;^ 



(A7) 
(A8) 

(A9) 



Note that in the above, T T N _1 T is the block containing the second derivatives of the log density with respect to 
the amplitudes {pbv,j, ciu',k), but by the assumed independence of noise at different frequency channels, the terms with 
v 7^ v' vanish. The elements of the upper part of the third row are 



F'N^G = /jM^N-^Ci/; 9 k ), 

and, finally, for the last block on the diagonal we have 

G*N- X G = Bj("5 Oj^gkiv; 6 k ). 

JEFFREYS' IGNORANCE PRIOR FOR A SPECTRAL INDEX 



(A10) 
(All) 

(A12) 



In £14.21 it was shown that to obtain unbiased parameter estimates based on marginal statistics for a power-law 
foreground model, it is necessary to adopt a proper ignorance prior for the spectral index. In this Appendix we derive 
the full expression for this prior, including noise and the antenna-to-thermodynamic conversion factor, a(y). 

The data model is in this case 





d v = Aa(y) 



(Bl) 



where n v is a noise term with variance a 2 = (n 2 ,)- We assume no noise correlations between frequencies. The 
log-likelihood then reads 

'd v -Aa(v)(v/v x Y\ 2 



-21n£(/3)cx^ 

V 

Jeffreys' ignorance prior is defined by 



d 2 InC 

d 2 (3 



(B3) 



where the averaging brackets denote an ensemble average. We therefore differentiate the log-likelihood twice and find 



' d 2 (3 



2ii 



In 2 



Aa(v) 



Aa(v) 



hr 



(B4) 



Taking the ensemble average of this expression simply means setting (d v ) = Aa{y)(y I v-\)$ , and the second term 
therefore vanishes, 



d 2 ln£ 

d 2 {3 



E 



A 2 a{vf 



2/i 



hr 



(B5) 



neglecting irrelevant constants. Thus, the full expression for Jeffreys' ignorance prior for a spectral index (3 reads 



a(v) 2 ( v 



2,1 



In' 



(B6) 



TEMPLATE ORTHOGONALITY CONSTRAINTS 



Most of the discussion in §4] concerned the degeneracy between free offsets at each frequency band and the zero- 
level of an foreground component with a free amplitude at each pixel. If neither is known, it is possible to add an 
arbitrary constant to all amplitudes, and subtract a correspondingly scaled value from each free offset, essentially 
without affecting the final x 2 - 

To break this degeneracy, two approaches were proposed. First, if external calibration is possible at one or more 
frequencies, such information should be exploited and conditioned upon. The second approach was to demand that 
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the free offsets do not have frequency component similar to that of the foreground, by effectively fitting out the 
corresponding spectrum from the offsets. 

These issues apply more generally to any fixed template with a free amplitude at all frequency bands. For most 
typical applications, this includes also three unknown dipole components, in addition to the familiar offset or monopole. 
In the following, we simply consider a general collection of templates denoted by T, which is an -ZV p i X x N t matrix 
consisting of Nt templates listed in its columns. Coupled to this, we define a coefficient vector b. u — {a„.i} containing 
the template amplitudes for each frequency and template. The sky response at frequency v is thus given by Ta^. 

We now derive the joint orthogonality constraints for Nt templates, for a sky model that includes two components 
with a free amplitude at each pixel, namely a CMB sky signal and a foreground component with a given frequency 
spectrum. We denote the vectors of free template coefficients for the CMB and foreground terms by b and c, respec- 
tively, and the foreground spectrum is defined by the N p i X x iV p i X diagonal matrix having entries equal to g(v; 9) 
on the diagonal. Note that the frequency spectrum of the CMB component is constant, and the corresponding matrix 
is therefore the identity. It is omitted in the following. 

With this notation, the x 2 to be minimized reads 

X 2 = K T * - b * Tt - c * F t T *) N^ 1 ( Ta - Tb - TF„c) (CI) 

Equating the derivatives of this expression with respect to b and c to zero gives 

dx 2 



dh 



2^T*N,; 1 (Ta-Tb-TF I ,c) = (C2) 



^ = -2V F'T'N: 1 (Ta - Tb - TF„c) = (C3) 
oc 

v 

These two sets of equations provide a general expression for a as a function of b and c, and at this point we have done 
nothing more than performed a partial change of basis. 

We now impose the prior that breaks the degeneracy between the template and the foreground amplitudes, by 
requiring c = 0. The two above equations then has a unique solution. In particular, a is given by 

£(B„ - DC-%)a = 0, (C4) 

V 

where we, for notational transparency, have defined four ancillary matrices 

A„ = T^N^T (C5) 
B = p*x t N~ 1 T (C6) 

C = X>„ (C7) 
D=£b„. (C8) 

V 

Note that Equation IC4I corresponds to iVt separate constraints on a, and, in particular, each row of the matrix in 
parentheses defines one orthogonality vector q„. These constraints are thus imposed in the CG solver using the same 
projection operator method that was described in ^3.2.11 

However, we once again stress that this approach corresponds to imposing a very strong prior that is not realized 
in practice: If there are indeed random fluctuations in the unknown offsets, then these will have a component that 
happens to have a spectrum similar to the foregrounds. This component will in the present approach be interpreted 
as a foreground signal. Further, the sampler is by this constraint not allowed to explore the joint distribution between 
the two components. In other words, both the marginal zero-level and offset uncertainties will be under-estimated by 
this approach. (Note that most other parameters, such as the CMB signal, are very weakly affected by this, because 
they only depend on the sum of the two components, not each of the two separately.) 

That being said, for experiments where this constraint is required (e.g., differential observatories such as WMAP), 
it is difficult indeed to construct an alternative and more self-consistent approach. For example, the WMAP team 
adopted a method based on a co-secant fit to a very crude, plane parallel galaxy model. In their case, n o uncertainties 
were q uoted at all. This specific issue is considered in further detail in the actual 3-yr WMAP analysis (|Eriksen et al.l 
[20073). 
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